|  |  |  |  |  |
| --- | --- | --- | --- | --- |
| PROF. AKKARY | DEPT. OF ELECTRICAL AND COMPUTER ENGINEERING | | | December 14, 2015 |
|  | AMERICAN UNIVERSITY OF BEIRUT | | |  |
|  | **EECE 422 – Parallel Computer Architecture and Programming** | | |  |
|  | **Final Exam – Fall 2015** | | |  |
|  |  | | |  |
| **NAME**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ | |  |  | |
| **ID**: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ | |  |  | |

**INSTRUCTIONS:**

* **The duration of the exam is TWO hours. No time extension.**
* **The exam is closed-book/closed-notes.**
* **Using Cell phones is not allowed in the examination room.**
* **Write your name and ID. NumBer in the space provided above.**
* **Circle only one answer.**
* **READ THE QUESTIONS CAREFULLY BEFORE ANSWERING.**
* **in some questions, more than one choice may be a valid answer. Circle the best choice you think is the most appropriate answer to the question.**
* **ALL QUESTIONS ARE EQUALLY WEIGHTED.**
* **There is no penalty for wrong answers.**
* **Use the back pages for scratch if needed**
* **Check that you have a total of 4 pages.**
* **No questions are allowed.**
* **You cannot leave the exam room for any reason until you complete the exam.**

1. Which of the following statements is **TRUE?**

a. MPI programs do not run on centralized memory multiprocessors with snoopy cache coherence hardware.

b. MPI programs do not run on distributed memory multiprocessors with directory-based cache coherence.

c. Both “a” and “b” statements above are false.

d. Both “a” and “b” statements above are true.

e. OpenMP can only be run on centralized memory multiprocessors, while MPI can only be run on distributed memory multiprocessors.

1. Which of the following statements about the MSI coherence protocol is **TRUE?**

a. The Invalid state means that the cache block has not been read or written by the CPU.

b. The Shared state means that the cache block is in more than one cache.

c. The Shared state means that the cache block has been read in the cache but has not been modified.

d. The Exclusive state means that the cache block has been read in one and only one cache.

e. None of the above is TRUE.

1. A CPU write hits a block in shared state and the block is in the cache of another processor. Which of the following statements is **TRUE?**

a. The other processor writes back its block and invalidates the block in its cache.

b. The other processor invalidates the block in its cache without writing the block back.

c. The other processor does nothing.

d. The CPU writes its cache without sending any bus request since it is a hit to the cache.

e. None of the above.

1. A CPU write misses and the block is in exclusive-modified state in another processor. Which of the following statements is **TRUE?**

a. The other processor writes back its block and invalidates the block in its cache.

b. The other processor writes back the block and changes its state to Shared.

c. The other processor does nothing.

d. The CPU that misses sends an invalidate bus request.

e. None of the above.

1. A CPU read misses a block in shared state and the block is in shared state in the cache of another processor. Which of the following statements is **TRUE?**

a. The other processor writes back its block and invalidates the block in its cache.

b. The other processor writes back its block without invalidating the block in its cache.

c. The other processor invalidates its block without writing it back.

d. The other processor does nothing.

e. None of the above.

1. In a multiprocessor system that uses directory-based cache coherence, a CPU write misses a block in shared state and the miss block is in shared state in two other caches. Assuming that the block home directory is in a fourth node. Which of the following statements is **TRUE?**

a. There will be 2 cache coherence messages.

b. There will be 3 cache coherence messages.

c. There will be 4 cache coherence messages.

d. There will be 5 cache coherence messages.

e. None of the above.

1. A shared memory distributed multiprocessor computer consists of 8 nodes. Each node contains 2-way set associative cache with total capacity 16K bytes and block size of 32 bytes. Each node contains 512 Kbytes (219 bytes) of DRAM and a directory for maintaining directory-based cache coherence. Which of the following statements is **TRUE**?
2. Each node directory consists of 256 entries.
3. Each node directory consists of 512 entries.
4. Each node directory consists of 16K entries.
5. Each node directory consists of 64K entries.
6. Directory size depends on the number of tag bits it uses.
7. A cache has a total capacity of 16K bytes. It is implemented as 2-way set associative, with block size of 32 bytes. The physical address on the machine consists of 32 bits. Which of the following statements is **TRUE**?
8. Number of set index bits = 7 and number of tag bits = 20
9. Number of set index bits = 8 and number of tag bits = 19
10. Number of set index bits = 9 and number of tag bits = 18
11. Number of set index bits = 8 and number of tag bits = 20
12. None of the above is TRUE.
13. Which of the following statements about memory consistency is **TRUE**:
14. The memory consistency model defines how loads and stores from different processors can be ordered in a shared memory system.
15. The memory consistency model does not allow a processor to reorder its loads and stores to global memory.
16. Memory consistency is something that only programmers need to be concerned about to write correct programs.
17. The memory consistency model is determined by the cache coherence hardware.
18. None of the above statements is TRUE.
19. Which of the following statements about processor consistency is **TRUE**:
20. Processor consistency does not allow a CPU to issue loads to memory according to sequential order.
21. Processor consistency allows a CPU to issue stores to memory in any order.
22. Processor consistency allows a CPU to issue loads and stores to memory in any order.
23. Processor consistency does not allow loads from different CPUs to issue to memory in arbitrary order.
24. Processor consistency does not allow stores from different CPUs to issue to memory in arbitrary order.
25. Which of the following statements about weak memory consistency is **FALSE**:
26. All memory operations from the CPU must complete before a fence instruction later in the program executes.
27. The memory operations from the CPU cannot issue to memory until an earlier fence in the program executes.
28. All memory operations from all the CPUs must complete before the fence executes.
29. Fence instructions from the same CPU cannot be reordered.
30. Two fence instructions from two different CPUs could be in any order.
31. Which of the following statements about locks and critical sections is **TRUE**:
32. Critical sections with high lock contention have high data conflicts.
33. Critical sections with low lock contention have low data conflicts.
34. Both “a” and “b” are TRUE.
35. Neither “a” nor “b” is TRUE.
36. The answer depends on whether the CPU uses transactional memory.
37. Which of the following statements is **TRUE**:
38. A spin lock is a software technique to implement non-blocking lock.
39. A spin lock is a software technique to implement blocking lock.
40. A spin lock is a software technique to implement a non-blocking send message.
41. A spin lock is a software technique to implement a blocking send message.
42. None of the above is TRUE.
43. Which of the following statements about locks and critical sections is **TRUE**:
44. Critical sections with high lock contention have high data conflicts.
45. Software that uses test-and-set to acquire a lock executes the test-and-set instruction atomically.
46. Software that uses test-and-set to acquire the lock also uses test-and-set instruction to release the lock.
47. Hardware executes the test-and-set atomically.
48. Hardware executes the test-and-set and the critical section atomically.
49. Which of the following statements about speculative lock elision is **FALSE**:
50. Speculative lock elision hardware executes test-and-set without invalidating other caches.
51. Speculative lock elision hardware does not set the lock variable in the cache.
52. Speculative lock elision hardware executes a critical section atomically.
53. Speculative lock elision hardware provides code backward compatibility.
54. Speculative lock elision does not benefit critical sections with high data conflicts.

1. Which of the following statements about parallel programming is **FALSE**:
2. You can have multiple parallel regions in an OpenMP program but not in an MPI program.
3. Both OpenMP and MPI can run on shared bus multiprocessor that uses snoopy cache coherence.
4. Both OpenMP and MPI require thread synchronization to access shared data.
5. A programmer can write a set of library functions to send and receive messages between OpenMP threads.
6. You need cache coherence hardware to execute OpenMP programs efficiently.
7. Which of the following pair of phrases are least related:
8. MMX processor and vector processor.
9. Chaining on Cray vector machine and pipelining.
10. Uniform Memory multiprocessor and centralized memory multiprocessor.
11. Memory consistency and cache coherence.
12. Array processor and SIMD architecture.
13. Assume that a multiplication takes 4 cycles and an addition takes 4 cycles to execute. Without chaining, the number of cycles it takes on the Cray 1 vector machine to compute the expression aV1 + V2 is:
14. 8 cycles.
15. 256 cycles.
16. 260 cycles
17. 512 cycles.
18. None of the above
19. Assume that a multiplication takes 4 cycles and an addition takes 4 cycles to execute. With chaining, the number of cycles it takes on the Cray 1 vector machine to compute the expression aV1 + V2 is:
20. 8 cycles.
21. 256 cycles.
22. 260 cycles
23. 512 cycles.
24. None of the above.
25. Which of the following network topologies are most related:
26. Hypercube and Mesh
27. Mesh and Torus
28. Hypercube and Tree
29. Ring and Crossbar
30. Omega and Torus
31. Which of the following statements about communication latency in direct networks is **TRUE**:
32. Communication latency does not depend on the state of the network.
33. Communication latency mostly depends on the route.
34. Communication latency does not depend on the network topology.
35. Communication latency depends on blockage time.
36. Communication latency is less important as a performance metric than communication bandwidth.
37. Which of the following statements about routing methods is **TRUE**:
38. Source routing has higher switching overhead than adaptive routing.
39. Adaptive routing has higher switching overhead than source routing.
40. Source routing and adaptive routing has about the same switching overhead.
41. It depends on the type of switching used.
42. None of the above is TRUE.
43. Which of the following statements about packet routing is **FALSE**:
44. In store-and-forward routing, a router waits for the whole packet to arrive before sending it to the next node.
45. In cut-through routing a router examines the header flit and forwards the header before the whole packet arrives.
46. In wormhole routing, a router examines the header flit and forwards the header before the whole packet arrives.
47. In wormhole routing, when the header blocks, the router has to buffer the full packet.
48. In cut-through routing, when the header blocks, the router has to buffer the full packet.
49. Which of the following statements about packet routing is **FALSE**:
50. In store-and-forward routing, communication latency of a large message is proportional to the route distance.
51. In store-and-forward routing, communication latency of a small message is proportional to the route distance.
52. In wormhole routing, communication latency of a large message is proportional to the route distance.
53. In wormhole routing, communication latency of a large message is mostly independent of the route distance.
54. In wormhole routing, communication latency of a message is inversely proportional to the channel bandwidth.
55. Which of the following statements about the Turn model in routing is **TRUE**:
56. The Turn model avoids deadlocks by adding more buffers.
57. The Turn model avoids deadlocks by adding virtual channels.
58. The Turn model avoids deadlocks by preemption of buffers.
59. The Turn model avoids deadlocks by prohibiting some turns.
60. The Turn model avoids deadlocks by turning the route in a different direction when blockage is encountered.
61. Bonus question: which of the following statements about dimension order routing is **TRUE**:
62. Dimension order routing is an example of Turn model routing.
63. Dimension order routing is an example of adaptive routing.
64. Dimension order routing avoids deadlocks by adding virtual channels.
65. None of the above is TRUE.